perf: cache or index TestSuiteGate relevant-test discovery#143
perf: cache or index TestSuiteGate relevant-test discovery#143shrutu0929 wants to merge 2 commits intoRefactron-ai:mainfrom
Conversation
📝 WalkthroughWalkthroughAdded a module-level test discovery cache and implemented a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
refactron/verification/checks/test_gate.py (1)
73-80: 🛠️ Refactor suggestion | 🟠 MajorIntegration guidance: Merge caching into the existing class.
To properly implement this PR's caching optimization:
- Keep the existing
TestSuiteGate(BaseCheck)class as the single definition- Add the
_TEST_DISCOVERY_CACHEat module level (after existing imports)- Modify the existing
_find_relevant_testsmethod (line 166) to consult/update the cache- Add the
ignore_patternsfiltering to the existing method- Keep the existing
_imports_moduleexact-matching logic (line 186)This preserves API compatibility and maintains the
BaseCheckcontract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/verification/checks/test_gate.py` around lines 73 - 80, Keep the single TestSuiteGate(BaseCheck) class and add a module-level cache named _TEST_DISCOVERY_CACHE (defined after imports). In the existing TestSuiteGate._find_relevant_tests method, consult _TEST_DISCOVERY_CACHE keyed by relevant state (e.g., changed file path or module name) to return cached test lists when present, and update the cache after discovering tests. Also apply ignore_patterns filtering inside _find_relevant_tests before returning/updating the cache. Do not change the exact-matching logic in TestSuiteGate._imports_module; leave that method as-is.
🧹 Nitpick comments (1)
refactron/verification/checks/test_gate.py (1)
53-58: Cache stores Path objects without invalidation strategy.The cache stores
Pathobjects that could become stale if test files are added, modified, or deleted between runs. Consider:
- Storing modification timestamps and checking for staleness
- Adding a TTL-based expiration
- Providing a
clear_cache()function for explicit invalidation🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@refactron/verification/checks/test_gate.py` around lines 53 - 58, The cache (_TEST_DISCOVERY_CACHE) currently stores Path objects for a cache_key/target_module pair and can become stale; modify the logic around the caching in the function that writes/reads _TEST_DISCOVERY_CACHE so that before returning cached relevant_tests you validate freshness—store file mtimes (or a hash) alongside the Path entries and compare current mtimes to detect changes, or attach a TTL timestamp and expire entries when older than the TTL; additionally add a public clear_cache() helper that empties _TEST_DISCOVERY_CACHE for explicit invalidation and ensure the function that populates the cache updates the stored metadata whenever relevant_tests are recalculated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 42-51: The loop scanning files via search_root.rglob("*.py")
incorrectly applies the ignore_patterns only when project_root is set and only
recognizes test files with a "test_" prefix; update the logic so the ignore
check (any(part in ignore_patterns for part in p.parts)) runs unconditionally
for every candidate from search_root.rglob, and broaden the test-file detection
to accept either p.name.startswith("test_") or p.name.endswith("_test.py"); keep
the existing p.is_file() and _imports_module(p, target_module) checks and append
matching paths to relevant_tests as before.
- Around line 21-27: The import-check currently uses substring matching
(target_module in alias.name and target_module in node.module) which causes
false positives; update the checks in the ast.Import and ast.ImportFrom branches
to use exact-module or submodule matching instead: replace "target_module in
alias.name" with "alias.name == target_module or
alias.name.startswith(target_module + '.')", and replace "target_module in
node.module" with "node.module == target_module or
node.module.startswith(target_module + '.')", keeping the rest of the logic
around ast.Import, ast.ImportFrom, alias.name and node.module unchanged.
- Around line 10-14: Remove the duplicate TestSuiteGate class and merge its
caching behavior into the existing TestSuiteGate that inherits from BaseCheck;
specifically, delete the standalone class definition that defines __init__(self,
search_root: Path, project_root: Optional[Path] = None) and instead incorporate
any cache fields and logic into the existing TestSuiteGate(BaseCheck)
implementation (keep its original constructor signature and API), update its
__init__ to initialize cache using search_root/project_root values without
changing parameters expected by callers, and ensure the existing verify(self,
...) method implements the new caching checks so the class continues to satisfy
the BaseCheck contract used by verification/engine.py.
- Line 59: The module docstring string literal """TestSuiteGate — Check 3: run
relevant tests against transformed code.""" is not at the top of the file; move
that docstring to be the very first statement in the module (before any imports)
and remove the unintended prepended lines that pushed it down (the stray content
added above the imports), ensuring the file begins with the docstring and then
the imports and definitions (e.g., classes/functions in this module such as
TestSuiteGate or any test-related functions).
- Around line 28-30: The code currently swallows all exceptions in the except
Exception: pass block (returning False), which hides parsing/encoding errors;
change the handler to except Exception as e: and log the exception (for example
using logging.exception(...) or a module-level logger.exception(...)) before
returning False so errors during AST parsing in test_gate.py are visible; ensure
the logging call includes context (e.g., "failed to evaluate test gate" and the
exception).
- Around line 1-8: The new caching code was prepended and created duplicate
imports and a second TestSuiteGate definition; remove the duplicated top-level
imports and the extra TestSuiteGate class and integrate the caching logic into
the existing TestSuiteGate class (use the class's existing methods that discover
tests to consult/update the cache named _TEST_DISCOVERY_CACHE rather than a
separate block). Replace the unbounded module-level dict by using
functools.lru_cache or a bounded eviction structure (or wrap cache accesses with
a max-size policy) and ensure all references use the same _TEST_DISCOVERY_CACHE
symbol inside TestSuiteGate so there is a single canonical implementation.
---
Outside diff comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 73-80: Keep the single TestSuiteGate(BaseCheck) class and add a
module-level cache named _TEST_DISCOVERY_CACHE (defined after imports). In the
existing TestSuiteGate._find_relevant_tests method, consult
_TEST_DISCOVERY_CACHE keyed by relevant state (e.g., changed file path or module
name) to return cached test lists when present, and update the cache after
discovering tests. Also apply ignore_patterns filtering inside
_find_relevant_tests before returning/updating the cache. Do not change the
exact-matching logic in TestSuiteGate._imports_module; leave that method as-is.
---
Nitpick comments:
In `@refactron/verification/checks/test_gate.py`:
- Around line 53-58: The cache (_TEST_DISCOVERY_CACHE) currently stores Path
objects for a cache_key/target_module pair and can become stale; modify the
logic around the caching in the function that writes/reads _TEST_DISCOVERY_CACHE
so that before returning cached relevant_tests you validate freshness—store file
mtimes (or a hash) alongside the Path entries and compare current mtimes to
detect changes, or attach a TTL timestamp and expire entries when older than the
TTL; additionally add a public clear_cache() helper that empties
_TEST_DISCOVERY_CACHE for explicit invalidation and ensure the function that
populates the cache updates the stored metadata whenever relevant_tests are
recalculated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9b4e40d5-cc44-4733-94a6-658cc1b005ec
📒 Files selected for processing (1)
refactron/verification/checks/test_gate.py
| import ast | ||
| from pathlib import Path | ||
| from typing import Dict, List, Optional | ||
|
|
||
| # Global cache for module -> relevant tests mapping | ||
| # Structure: {project_root_str: {target_module: [test_paths]}} | ||
| _TEST_DISCOVERY_CACHE: Dict[str, Dict[str, List[Path]]] = {} | ||
|
|
There was a problem hiding this comment.
Critical: New code was prepended without integrating with existing file content.
The pipeline failures (F811, E402) confirm this file now has duplicate imports and a duplicate TestSuiteGate class definition. The new caching logic in lines 1-58 must be merged into the existing class at line 73, not added as a separate block.
Additionally, the module-level _TEST_DISCOVERY_CACHE is unbounded and will grow indefinitely in long-running processes. Consider adding a max-size eviction policy or using functools.lru_cache.
🧰 Tools
🪛 GitHub Actions: Pre-commit
[error] 1-1: pre-commit hook black failed and modified files (reformatted refactron/verification/checks/test_gate.py).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 1 - 8, The new
caching code was prepended and created duplicate imports and a second
TestSuiteGate definition; remove the duplicated top-level imports and the extra
TestSuiteGate class and integrate the caching logic into the existing
TestSuiteGate class (use the class's existing methods that discover tests to
consult/update the cache named _TEST_DISCOVERY_CACHE rather than a separate
block). Replace the unbounded module-level dict by using functools.lru_cache or
a bounded eviction structure (or wrap cache accesses with a max-size policy) and
ensure all references use the same _TEST_DISCOVERY_CACHE symbol inside
TestSuiteGate so there is a single canonical implementation.
| class TestSuiteGate: | ||
| def __init__(self, search_root: Path, project_root: Optional[Path] = None): | ||
| self.search_root = search_root | ||
| self.project_root = project_root | ||
|
|
There was a problem hiding this comment.
Critical: Duplicate class definition missing BaseCheck inheritance.
This TestSuiteGate class (line 10) conflicts with the existing TestSuiteGate(BaseCheck) at line 73. The new class:
- Does not inherit from
BaseCheck, violating the verification engine contract - Changes the constructor signature (adds required
search_rootparameter), breaking API compatibility - Lacks the
verify()method required byBaseCheck
The caching logic should be integrated into the existing class at line 73, not defined as a separate class.
Based on learnings: "BaseCheck ABC must have 3 implementations: SyntaxVerifier, ImportIntegrityVerifier, and TestSuiteGate in verification/engine.py"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 10 - 14, Remove the
duplicate TestSuiteGate class and merge its caching behavior into the existing
TestSuiteGate that inherits from BaseCheck; specifically, delete the standalone
class definition that defines __init__(self, search_root: Path, project_root:
Optional[Path] = None) and instead incorporate any cache fields and logic into
the existing TestSuiteGate(BaseCheck) implementation (keep its original
constructor signature and API), update its __init__ to initialize cache using
search_root/project_root values without changing parameters expected by callers,
and ensure the existing verify(self, ...) method implements the new caching
checks so the class continues to satisfy the BaseCheck contract used by
verification/engine.py.
| if isinstance(node, ast.Import): | ||
| for alias in node.names: | ||
| if target_module in alias.name: | ||
| return True | ||
| elif isinstance(node, ast.ImportFrom): | ||
| if node.module and target_module in node.module: | ||
| return True |
There was a problem hiding this comment.
Bug: Substring matching produces false positives.
Using target_module in alias.name (lines 23, 26) matches any import containing the target as a substring. For example, searching for "util" would incorrectly match "utilities", "string_util", or "my_utility_module".
The existing implementation at line 197-201 correctly uses exact matching:
# Correct approach (from existing code)
if alias.name == module_name or alias.name.startswith(module_name + "."):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 21 - 27, The
import-check currently uses substring matching (target_module in alias.name and
target_module in node.module) which causes false positives; update the checks in
the ast.Import and ast.ImportFrom branches to use exact-module or submodule
matching instead: replace "target_module in alias.name" with "alias.name ==
target_module or alias.name.startswith(target_module + '.')", and replace
"target_module in node.module" with "node.module == target_module or
node.module.startswith(target_module + '.')", keeping the rest of the logic
around ast.Import, ast.ImportFrom, alias.name and node.module unchanged.
| except Exception: | ||
| pass | ||
| return False |
There was a problem hiding this comment.
Avoid silent exception swallowing.
Catching all exceptions and silently passing hides AST parsing errors, encoding issues, and other problems that would be valuable for debugging. At minimum, log the exception.
Proposed fix
+ import logging
+ logger = logging.getLogger(__name__)
+ # ...
except Exception:
- pass
+ logger.debug("Failed to parse %s for import detection", test_file, exc_info=True)
return False🧰 Tools
🪛 Ruff (0.15.7)
[error] 28-29: try-except-pass detected, consider logging the exception
(S110)
[warning] 28-28: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 28 - 30, The code
currently swallows all exceptions in the except Exception: pass block (returning
False), which hides parsing/encoding errors; change the handler to except
Exception as e: and log the exception (for example using logging.exception(...)
or a module-level logger.exception(...)) before returning False so errors during
AST parsing in test_gate.py are visible; ensure the logging call includes
context (e.g., "failed to evaluate test gate" and the exception).
| for p in self.search_root.rglob("*.py"): | ||
| # Skip obvious non-target paths if project_root is set | ||
| if self.project_root: | ||
| # Check if any parent directory is in the ignore list | ||
| if any(part in ignore_patterns for part in p.parts): | ||
| continue | ||
|
|
||
| if p.is_file() and p.name.startswith("test_"): | ||
| if self._imports_module(p, target_module): | ||
| relevant_tests.append(p) |
There was a problem hiding this comment.
Logic issues: Inconsistent filtering and incomplete test file detection.
-
Inconsistent ignore filtering (lines 44-47): The
ignore_patternsfilter only applies whenproject_rootis set, butrglobalways scans fromsearch_root. This means virtual environments and cache directories are scanned whenproject_rootisNone. -
Missing test file pattern (line 49): Only checks for
test_prefix, but the existing implementation at line 174 also matches*_test.pysuffix:if not (name.startswith("test_") or name.endswith("_test.py")):
Proposed fix for test pattern
- if p.is_file() and p.name.startswith("test_"):
+ if p.is_file() and (p.name.startswith("test_") or p.name.endswith("_test.py")):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 42 - 51, The loop
scanning files via search_root.rglob("*.py") incorrectly applies the
ignore_patterns only when project_root is set and only recognizes test files
with a "test_" prefix; update the logic so the ignore check (any(part in
ignore_patterns for part in p.parts)) runs unconditionally for every candidate
from search_root.rglob, and broaden the test-file detection to accept either
p.name.startswith("test_") or p.name.endswith("_test.py"); keep the existing
p.is_file() and _imports_module(p, target_module) checks and append matching
paths to relevant_tests as before.
|
|
||
| _TEST_DISCOVERY_CACHE[cache_key][target_module] = relevant_tests | ||
| return relevant_tests | ||
| """TestSuiteGate — Check 3: run relevant tests against transformed code.""" |
There was a problem hiding this comment.
Module docstring is misplaced due to code prepending.
The module docstring should be at the top of the file (before imports). Its current position confirms that lines 1-58 were incorrectly prepended to the existing file content.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` at line 59, The module docstring
string literal """TestSuiteGate — Check 3: run relevant tests against
transformed code.""" is not at the top of the file; move that docstring to be
the very first statement in the module (before any imports) and remove the
unintended prepended lines that pushed it down (the stray content added above
the imports), ensuring the file begins with the docstring and then the imports
and definitions (e.g., classes/functions in this module such as TestSuiteGate or
any test-related functions).
solve #139
The recent commit introduces a performance optimization for test file discovery within the TestSuiteGate._find_relevant_tests method in order to reduce latency in large repositories. We implemented an in-memory caching mechanism that maps either the project_root or search_root to specific target modules, storing their uniquely associated test files. When a test discovery is requested, the system first consults this cache to immediately yield previously resolved paths. If the cache is unpopulated, it performs the standard file scan but with an added refinement: it intelligently skips scanning obvious non-target directories (such as virtual environments, .pytest_cache, and site-packages) to further minimize disk I/O. The parsed and verified test paths are then saved to the cache, eliminating the overhead of continuously re-reading and parsing the Abstract Syntax Tree (AST) for unchanged code on subsequent executions.
Summary by CodeRabbit